brier curve
Evaluating classification performance across operating contexts: A comparison of decision curve analysis and cost curves
Millard, Louise AC, Flach, Peter A
Classification models typically predict a score and use a decision threshold to produce a classification. Appropriate model evaluation should carefully consider the context in which a model will be used, including the relative value of correct classifications of positive versus negative examples, which affects the threshold that should be used. Decision curve analysis (DCA) and cost curves are model evaluation approaches that assess the expected utility and expected loss of prediction models, respectively, across decision thresholds. We compared DCA and cost curves to determine how they are related, and their strengths and limitations. We demonstrate that decision curves are closely related to a specific type of cost curve called a Brier curve. Both curves are derived assuming model scores are calibrated and setting the classification threshold using the relative value of correct positive and negative classifications, and the x-axis of both curves are equivalent. Net benefit (used for DCA) and Brier loss (used for Brier curves) will always choose the same model as optimal at any given threshold. Across thresholds, differences in Brier loss are comparable whereas differences in net benefit cannot be compared. Brier curves are more generally applicable (when a wider range of thresholds are plausible), and the area under the Brier curve is the Brier score. We demonstrate that reference lines common in each space can be included in either and suggest the upper envelope decision curve as a useful comparison for DCA showing the possible gain in net benefit that could be achieved through recalibration alone.
Calibrating sufficiently
Binary classification, in the first place, deals with decision tools (classifiers) that facilitate the prediction of the classes of instances on the basis of the so-called features of the instances. Accordingly, the simplest classifiers are crisp (or discrete) in the sense of having the set {0, 1} as output range: 1 for'predict positive class', 0 for'predict negative class. Scoring (or soft) classifiers provide output in a continuous range, usually with the interpretation that high values indicate high likelihood of the instance belonging to the positive class, while low values suggest that membership of the negative class is more likely. In many applications of classification, there is a need for'calibrated' probabilistic classifiers which reflect the likelihood of the positive class given the features of an instance in a frequentist statistical sense (Platt, 2000; Zadrozny and Elkan, 2002; Cohen and Goldszmidt, 2004; Kull et al., 2017). How to best achieve good calibration and how to measure it are active research areas (Böken, 2021; Roelofs et al., 2020).
Technical Note: Towards ROC Curves in Cost Space
Hernández-Orallo, José, Flach, Peter, Ferri, Cèsar
ROC curves and cost curves are two popular ways of visualising classifier performance, finding appropriate thresholds according to the operating condition, and deriving useful aggregated measures such as the area under the ROC curve (AUC) or the area under the optimal cost curve. In this note we present some new findings and connections between ROC space and cost space, by using the expected loss over a range of operating conditions. In particular, we show that ROC curves can be transferred to cost space by means of a very natural way of understanding how thresholds should be chosen, by selecting the threshold such that the proportion of positive predictions equals the operating condition (either in the form of cost proportion or skew). We call these new curves {ROC Cost Curves}, and we demonstrate that the expected loss as measured by the area under these curves is linearly related to AUC. This opens up a series of new possibilities and clarifies the notion of cost curve and its relation to ROC analysis. In addition, we show that for a classifier that assigns the scores in an evenly-spaced way, these curves are equal to the Brier Curves. As a result, this establishes the first clear connection between AUC and the Brier score.